Fetch Reordering and Partitioning of Execution Resources for Atomic Instruction Blocks with Control Flow Assertions by Yen - Ting Tony

نویسندگان

  • Yen-Ting Tony Tung
  • Brian Fahs
  • Matt Crum
  • Brian Slechta
  • Francesco Spadini
چکیده

The rePLay framework provides a mechanism upon which a variety of code optimizations can be deployed as an application executes. In this thesis, two optimizations are explored. First, the order in which instructions are fetched is optimized. Performance with an optimized schedule is shown to improve by 2.63%. The second optimization is to partition instructions for a clustered microarchitecture. This thesis demonstrates that when such a microarchitecture is implemented along with an optimized fetch schedule, performance still improves by 0.61%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Out-of-Order Instruction Fetch Using Multiple Sequencers

Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide ...

متن کامل

Fetch Gating Control Through Speculative Instruction Window Weighting

In a dynamic reordering superscalar processor, the front-end fetches instructions and places them in the issue queue. Instructions are then issued by the back-end execution core. Till recently, the front-end was designed to maximize performance without considering energy consumption. The front-end fetches instructions as fast as it can until it is stalled by a filled issue queue or some other b...

متن کامل

Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures - Microarchitecture, 1996., IEEE/ACM International Symposium on

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers offunctional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, calle...

متن کامل

Fast approximately timed simulation

In this paper we present a technique for fast approximately timed simulation of software within a virtual prototyping framework. Our method performs a static analysis of the program control flow graph to construct annotations of the simulated program, combined with dynamic performance information. The static analysis estimates execution time based on a target architecture model. The delays intr...

متن کامل

Very-Wide-Issue Superscalar Microengine Configurations

To continue microprocessor performance improvements made in the last 2 decades, instruction-level parallelism must be exploited across multiple basic block boundaries. This necessity has led to execution engines which dynamically predict a stream of instructions which are executed concurrently. As issue widths increase, former assumptions about requirements for execution resources such as inter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001